You are given an array of words where each word consists of lowercase English letters.
wordA is a predecessor of wordB if and only if we can insert exactly one letter anywhere in wordA without changing the order of the other characters to make it equal to wordB.
- For example,
"abc"is a predecessor of"abac", while"cba"is not a predecessor of"bcad".
A word chain is a sequence of words [word1, word2, ..., wordk] with k >= 1, where word1 is a predecessor of word2, word2 is a predecessor of word3, and so on. A single word is trivially a word chain with k == 1.
Return the length of the longest possible word chain with words chosen from the given list of words.
Example 1:
Input: words = ["a","b","ba","bca","bda","bdca"] Output: 4 Explanation: One of the longest word chains is ["a","ba","bda","bdca"].
Example 2:
Input: words = ["xbc","pcxbcf","xb","cxbc","pcxbc"] Output: 5 Explanation: All the words can be put in a word chain ["xb", "xbc", "cxbc", "pcxbc", "pcxbcf"].
Example 3:
Input: words = ["abcd","dbqca"] Output: 1 Explanation: The trivial word chain ["abcd"] is one of the longest word chains. ["abcd","dbqca"] is not a valid word chain because the ordering of the letters is changed.
Constraints:
1 <= words.length <= 10001 <= words[i].length <= 16words[i]only consists of lowercase English letters.
Average Rating: 4.83 (36 votes)
Solution
Overview
A word chain is a sequence of words (word1 -> word2 -> word3 -> word4 -> word5......) such that word1 is a predecessor of word2 and so on. A key point in the problem statement is that word1 can be a predecessor of word2 if and only if we can add exactly one letter anywhere in word1 to make it equal to word2. In other words, word2 should have one letter more than word1 and the position of this new letter can be anywhere. Note that the order of the words in the list does not need to be maintained while creating the word sequence.
Suppose that word1 is ab then word2 can be ab*, a*b, *ab where * is any lowercase English letter.
Therefore, it is possible for a particular word to have more than one predecessor in the given list, and thus belong to more than one word sequence. Our objective is to determine the length of the longest possible word sequence.
Let us consider the following example : ['abcd','abc','bcd','abd','ab','ad','b']
In this list, the immediate predecessors of abcd are ['abc','bcd','abd'] as all these words are missing exactly one letter from the word abcd.
Similarly, the immediate predecessors of abd are ['ab','ad'] and the predecessor of ab is ['b'].
Approach 1: Top-Down Dynamic Programming (Recursion + Memoization)
Intuition
If you're not familiar with DFS (Depth First Search), check out our Explore Card.
Here we work backwards to find the longest chain, this means that we will start from a word and delete one character at a time. We continue this chain until we come across a word that is not present in the list or is one letter long.
In the above example some of the possible word sequences are: abcd -> abd -> ab -> b , abcd -> abc -> ab -> b , abcd -> bcd and so on.
The possible word sequences are illustrated in Figure 1.
Figure 1. Figure demonstrating DFS to find the longest word sequence.
In this graph, we can observe that the length of the longest possible word sequence is 4. There are two word sequences that have the longest length : abcd -> abd -> ab -> b and abcd -> abc -> ab -> b. (The longest path is shown in the diagram with red arrows).
Notice that a particular sequence can be a part of more than one word sequence. For example the sequence ab -> b is part of both the following sequences : abcd -> abd -> ab -> b and abcd -> abc -> ab -> b. This leads to repeated calculations because every time we encounter ab we need to explore the subpath ab -> a. For a small list, this is not a problem but as the size of the list increases, the size of the graph grows exponentially.
What we can do is whenever we encounter a new word, we will find all possible sequences with this word as the last word in the sequence. Then, we will store the length of the longest possible sequence that ends with this word.
We will use a map for this where each key will be an ending word and the value will be the length of the longest possible word sequence ending with this word. In the above example when we first encounter the word ab we will store the value 2 (word sequence ab -> b) for key ab. The next time we encounter ab, we will simply return the value stored against it in the map instead of going through the entire subtree again. This process is known as memoization and it prevents recalculation. For every word present in the list, we only need to determine the length of the longest path that ends with this word once.
Algorithm
-
Initialize a
set(wordsPresent) and add all the words in the list to the set. This set will be used to check if a word is present in the list. -
Initialize a map (
memo) havingkeytype asStringandvaluetype asInteger. This map will store the length of the longest possible word sequence where thekeyis the last word in the sequence. -
Iterate over the list. For each word in the list perform a depth-first search.
-
In the DFS, consider the current word (
currentWord) as the last word in the word sequence. -
If
currentWordwas encountered previously we just return its corresponding value in the mapmemo. -
Initialize
maxLengthto 1. -
Iterate over the entire length of the
currentWord.- Create all possible words (
newWord) by taking out one character at a time. - If
newWordis present in thesetperform a DFS with this word and store the intermediate result in a variablecurrentLength. - Update the
maxLengthso that it contains the length of the longest sequence possible where thecurrentWordis the end word.
- Create all possible words (
-
Set the
maxLengthas thevalueforcurrentWord(key) in the map. -
Return
maxLength.
Implementation
Complexity Analysis
Let N be the number of words in the list and L be the maximum possible length of a word.
-
Time complexity: O(L2⋅N).
Initially, we iterate over the list to store all the given words in a
set(adds N to the complexity).Next, we perform a DFS for each word (O(N)). For each word, we iterate over its length(O(L)). At each index (
i) we create a new word by deleting the character at positionifrom the original word (O(L)). Therefore, the overall time complexity is O(N+(L2⋅N)) = O(L2⋅N)), because the N term is insignificant relative to the L2⋅N term. Note that because of memoization we can be sure that each word in the list is traversed only once. -
Space complexity: O(N).
The extra space is used by the recursion call stack. In worst case all the words are a part of the longest word sequence which requires a recursion stack size of N.
Also, we use a
setto store all distinct words (size N) and amapto store intermediate results (size N). Since the maximum number of distinct words will be N (when there is no repetition) the overall space complexity is O(2⋅N) which in Big O notation equals O(N).
Approach 2: Bottom-Up Dynamic Programming
Intuition
In this solution, we will create the word sequence by adding one letter at a time to the last word in the sequence. Thus the resulting word sequence will be a series of words where each word has one more letter than its predecessor.
If we know the length (previousLength) of the longest word sequence that ends with a word we can use this value to find the length of the longest word sequence for its successor(newLength = previousLength + 1).
Let us again consider the above example ['abcd','abc','bcd','abd','ab','ad','b'].
The longest word sequence with the word b is 1. Thus the length of the longest word sequence with the word ab will be 1 + 1 = 2 (ab -> b). This result can in turn be used to find the length of the longest word sequence for the word abc (2 + 1 = 3 for sequence abc -> ab -> b).
The length of the words in a sequence increases as we move from left to right. Also, we know that the order of the words in the list doesn't matter. So we can sort the words in ascending order based on their length. Next, we can iterate over the sorted list and calculate the length of the longest sequence possible where the word at index i is the end word. We store this result in a map where key is the word and value is the sequence length. By doing this we ensure that, for each word that we encounter, we already know the result of all of its possible predecessors. This process is demonstrated in the following animation.
Algorithm
-
Initialize a map where
keyis the word andvalueis the length of the longest word sequence possible with thekeyas the end word. -
Sort the word list in increasing order of the word length.
-
Initialize
longestWordSequenceLengthto 1. This variable holds the length of the longest word sequence possible. -
Iterate over the sorted list.
-
For each word initialize
presentLengthto 1. -
Iterate over the entire length of each word.
- Delete the character at ith position from the current word and assign the new word to the variable
predecessor. - Check if
predecessoris present in the list or not. - If the
predecessoris present, then assign its mapped value topreviousLength. Update thepresentLengthifpreviousLength + 1is greater than thepresentLength.
- Delete the character at ith position from the current word and assign the new word to the variable
-
After terminating the inner
forloop, assignpresentLengthto the current word in the mapdp. -
Update the
longestWordSequenceLengthif the longest word sequence formed with the current word as the end word is longer than the previously considered word sequence. -
After terminating the outer
forloop, returnlongestWordSequenceLength.
Implementation
Complexity Analysis
Let N be the number of words in the list and L be the maximum possible length of a word.
-
Time complexity: O(N⋅(logN+L2)).
Sorting a list of size N takes O(NlogN) time. Next, we use two for loops in which the outer loop runs for O(N) iterations and the inner loop runs for O(L2) iterations in the worst case scenario. The first L is for the inner loop and the second L is for creating each
predecessor. Thus the overall time complexity is O(NlogN+(N⋅L2)) which equals O(N⋅(logN+L2)). -
Space complexity: O(N).
We use a map to store the length of the longest sequence formed with each of the N words as the end word.
Am I the only one who feels too dumb after not being able to come up with any solution for this 'medium' problem? :(...
Last Edit: May 11, 2021 10:14 PM
Neat and clear Explanation! Keep it up! ..Wish the majority of official solution articles were like this..
Very well explained.
Just one more addition in case it makes anyone curious, It can be solved using the concept of LIS(Longest Increasing Subsequence) also provided you sort the input first in ascending order of length.
Link : https://leetcode.com/problems/longest-string-chain/discuss/1213820/C%2B%2B-(Why-it-looks-like-LIS-pattern-(easy-comments))-%3A-Memoized-greaterDP
You can expect this question in an interview. There are at least 3 methods of solving this question.
can someone enlighten me about the time complexity of substr fn in c++? In my opinion, it does take linear time to fetch u the substring, but I don't see it being counted in time complexity as of the whole. Why is it so?? am I wrong with this concept? If yes plz correct me.
Last Edit: May 18, 2021 11:30 AM
Approach 2: word may be searched first from dp before removing any char, helpful if there are any duplicated word in the given words list:
for (const string &word : words) {
int presentLength = 0;
if (dp.find(word) != dp.end()) {
presentLength = dp[word];
}
else {
presentLength = 1;
for (int i = 0; i < word.length(); i++) {
...
}
dp[word] = presentLength;
}
May 18, 2021 11:16 AM
I'm curious to know how you decided to approach the problem by deleting characters and going backwards, I spent a very very long time trying to add characters.
Last Edit: May 18, 2021 8:05 AM
I think the complexity for the 1st approach would be N^2 * L^2. Consider the max length of sequence be half of the words, i.e.., N/2. Now we are performing dfs on each word so outer for loop costs O(N). For the reasons mentioned in the solution, there is L^2 complexity in each dfs loop, but we are also recurring again at maximum till depth N/2 in this example, if the next word is present in the hashset which would cost additional O(N/2). so the complexity for this case will be O(N * N/2 *(L^2)).
Nice article!
One further optimisation for the bottom-up DP solution: hash the words by their length and skip the search, if for a group of words of length L, there's no L - 1 words exist (O(1) look-up with the extra hash table). This reduces the time complexity to O(U*logU + N*L^2) where U <= N is the number of unique word length values.
The following code runs below 95ms and faster than 99% submissions.
def longestStrChain(self, words: List[str]) -> int:
from collections import defaultdict
# dp represents longest chain length with the key word being the end of the chain
dp: Dict[str, int] = {}
len_hashed = defaultdict(set)
for w in words:
len_hashed[len(w)].add(w)
dp[w] = 1
res = 1
# Optimisation1 - allows faster sorting if there are words of the same length
for i in sorted(len_hashed.keys()):
# Optimisation 2 - allows skipping multiple words and their substring search if i - 1 length words don't exist
if i - 1 not in len_hashed:
continue
for w in len_hashed[i]:
for j in range(i):
temp = w[:j] + w[j + 1:]
if temp in dp:
dp[w] = max(dp[w], dp[temp] + 1)
res = max(dp[w], res)
return res
Time Submitted | Status | Runtime | Memory | Language |
|---|---|---|---|---|
| 05/17/2021 13:02 | Accepted | 84 ms | 17.3 MB | cpp |
| 04/05/2021 12:39 | Accepted | 84 ms | 17 MB | cpp |
| 04/05/2021 12:35 | Compile Error | N/A | N/A | cpp |
xxxxxxxxxxclass Solution {public: int longestStrChain(vector<string>& words) { }};
